361 - RECITATION 1
REVIEW
Creating vector and Matrix in R
Vector is a basic data structure in R. It contains element of the same type. The data types can be logical, integer, double, character, complex or raw.
- Vectors are generally created using the c() function.
- A vector’s type can be checked with the typeof() function.
- Another important property of a vector is its length. This is the number of elements in the vector and can be checked with the function length().
1. Create the vectors below.
a= (1,2,3,. . . ,28,29,30)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
b = (1,2,3,. . . ,10,1,2,3,. . . ,10,1,2,3,. . . ,10)
## [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5
## [26] 6 7 8 9 10
c= (1,1,1, 2,2,2, 3,3,3, 4,4,4,…, 10,10,10)
## [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
## [26] 9 9 10 10 10
d = (20,18,16,. . . ,6,4,2)
## [1] 20 18 16 14 12 10 8 6 4 2
Matrix is a two dimensional data structure in R programming. Matrix is similar to vector but additionally contains the dimension attribute.
All attributes of an object can be checked with the attributes() function (dimension can be checked directly with the dim() function).
We can check if a variable is a matrix or not with the class() function.
Diag() shows the diagonal elements of a matrix.
det() gives us determinant of a matrix.
t() gives the transpose of a matrix.
solve() takes the inverse of a matrix.
Matrix multiplication is %*%
2. Create the matrix below.
- 3 x 3 matrix with every entry equals to 2.
## [,1] [,2] [,3]
## [1,] 2 2 2
## [2,] 2 2 2
## [3,] 2 2 2
- 6 x 6 identity matrix
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 0 0 0 0 0
## [2,] 0 1 0 0 0 0
## [3,] 0 0 1 0 0 0
## [4,] 0 0 0 1 0 0
## [5,] 0 0 0 0 1 0
## [6,] 0 0 0 0 0 1
- 4 x 4 matrix X:
\[ X = \begin{bmatrix}5 & 1 & -3 & 7 \\ 3 & -2 & 0 & 1\\ 2 & 0 & 8 & 6\\ 7 & -1 & 0 & 1 \end{bmatrix} \]
## [,1] [,2] [,3] [,4]
## [1,] 5 1 -3 7
## [2,] 3 -2 0 1
## [3,] 2 0 8 6
## [4,] 7 -1 0 1
Matrix Operations
A) Use the previously created matrix X.
- Check dimension and class of X
## [1] "matrix" "array"
## [1] 4 4
## $dim
## [1] 4 4
- Compute determinant of matrix X.
## [1] 800
- Take transpose of X (Y = \(X^T\)).
## [,1] [,2] [,3] [,4]
## [1,] 5 3 2 7
## [2,] 1 -2 0 -1
## [3,] -3 0 8 0
## [4,] 7 1 6 1
- Take inverse of X (Z = \(X^{-1}\)).
## [,1] [,2] [,3] [,4]
## [1,] -0.01 -0.1025 -0.00375 0.195
## [2,] 0.04 -0.5900 0.01500 0.220
## [3,] -0.08 -0.0700 0.09500 0.060
## [4,] 0.11 0.1275 0.04125 -0.145
- Multiply X and Z.
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 0.000000e+00 0.000000e+00 2.220446e-16
## [2,] 5.551115e-17 1.000000e+00 -2.775558e-17 -8.326673e-17
## [3,] 1.110223e-16 -1.110223e-16 1.000000e+00 1.110223e-16
## [4,] 0.000000e+00 -5.551115e-17 0.000000e+00 1.000000e+00
B)Now suppose;
\[ A = \begin{bmatrix}1 & 2 & 5 \\ 2 & 3 & 6\\ -2 & 0 & 1\\ \end{bmatrix} \]
- Create matrix A.
## [,1] [,2] [,3]
## [1,] 1 2 5
## [2,] 2 3 6
## [3,] -2 0 1
- Write the diagonal elements of matrix A in a vector B.
## [1] 1 3 1
- Create a new matrix C with \(1^{st}\) and \(3^{rd}\) rows and \(2^{nd}\) and \(3^{rd}\) columns of A.
## [,1] [,2]
## [1,] 2 5
## [2,] 0 1
- Create a new matrix D by deleting 3rd row of A.
## [,1] [,2] [,3]
## [1,] 1 2 5
## [2,] 2 3 6
- Replace the third column of A by the sum of the second and third columns.
## [,1] [,2] [,3]
## [1,] 1 2 7
## [2,] 2 3 9
## [3,] -2 0 1
- Compute the mean of each column of A. (Without using loop)
## [1] 0.3333333 1.6666667 5.6666667
## [1] 0.3333333 1.6666667 5.6666667
## [1] 3.3333333 4.6666667 -0.3333333
Note That;
apply(X, MARGIN, FUN, …)
where:
X is an array or a matrix if the dimension of the array is 2;
MARGIN is a variable defining how the function is applied: when MARGIN=1, it applies over rows, whereas with MARGIN=2, it works over columns. Note that when you use the construct MARGIN=c(1,2), it applies to both rows and columns;
FUN, which is the function that you want to apply to the data. It can be any R function, including a User Defined Function (UDF).
Distribution functions
Every distribution has four associated functions whose prefix indicates the type of function and the suffix indicates the distribution. To exemplify the use of these functions, let use the normal (Gaussian) distribution. The four normal distribution functions are:
- dnorm: density function of the normal distribution
The probability density function (PDF, in short: density) indicates the probability of observing a measurement with a specific value and thus the integral over the density is always 1.
- pnorm: cumulative density function of the normal distribution
The cumulative density (CDF) function is a monotonically increasing function as it integrates over densities
- qnorm: quantile function of the normal distribution
The quantile function is simply the inverse of the cumulative density function (iCDF). Thus, the quantile function maps from probabilities to values. Let’s take a look at the quantile function for 𝑃[𝑋<=𝑥]
- rnorm: random sampling from the normal distribution
The random sampling function: rnorm
When we want to draw random samples from the normal distribution, you can use rnorm.
The full list of standard distributions available can be seen using “?distribution.”
Random Number Generation
R has functions to generate a random number from many standard distribution like uniform distribution, binomial distribution, normal distribution etc.
- Functions that generate random deviates start with the letter r. For example, runif() generates random numbers from a uniform distribution and rnorm() generates from a normal distribution.
## [1] 0.9532313
## [1] 0.8682768 0.1794216 0.8603041
## [1] 6.938969 5.528239 5.411266
Assume X ~ Normal(5, 9); generate a random sample of size n = 1000. (While generating use set.seed(361) command.)
## [1] 6.0846793 7.1054138 3.4195848 5.2180070 5.1573151 1.5472171
## [7] 8.2884016 4.8132810 6.0027304 8.4461702 3.2543662 2.7511491
## [13] 7.7312325 8.3522899 3.7435374 2.2771184 10.9100903 -0.7250312
## [19] 6.6288625 3.9624669
Compute following:
- Minimum & Maximum value and range
## [1] -4.204518
## [1] 15.22712
## [1] -4.204518 15.227121
- Mean, Median & Variance
## [1] 5.054446
## [1] 5.07124
## [1] 9.990544
- Order generated numbers in ascending order and print first and last 10.
## [1] -4.204518 -3.981195 -3.927701 -3.636500 -3.482656 -3.229263 -3.183069
## [8] -2.966523 -2.754209 -2.672590
## [1] 12.16955 12.39004 13.09075 13.10228 13.24375 13.49768 13.63390 13.66403
## [9] 14.74095 15.22712
Visualization
Plot the histogram and box plot of a vector that you generated in a single figure usning ggplot.
data_x<- as.data.frame(x) #to use ggplot, it is needed a dataframe
library(ggplot2)
h<-ggplot(data_x, aes(x=x)) +
geom_histogram(color="violet", fill="pink", bins = 30) +labs(title = "Histogram of normal distribution(n=1000)")
b<-ggplot(data_x, aes(x=x)) +
geom_boxplot(color="blue", fill="lightblue") +labs(title = "Boxplot of normal distribution(n=1000)")
library(ggpubr)
ggarrange(h,b)## [1] 6.0846793 7.1054138 3.4195848 5.2180070 5.1573151 1.5472171
## [7] 8.2884016 4.8132810 6.0027304 8.4461702 3.2543662 2.7511491
## [13] 7.7312325 8.3522899 3.7435374 2.2771184 10.9100903 -0.7250312
## [19] 6.6288625 3.9624669
data_y<- as.data.frame(y) #to use ggplot, it is needed a dataframe
library(ggplot2)
h<-ggplot(data_y, aes(x=y)) +
geom_histogram(color="violet", fill="pink", bins = 30) +labs(title = "Histogram of normal distribution(n=1000000)")
b<-ggplot(data_y, aes(x=y)) +
geom_boxplot(color="blue", fill="lightblue") +labs(title = "Boxplot of normal distribution(n=1000000)")
library(ggpubr)
ggarrange(h,b)As seen, as sample size increases, distribution of random variable looks more normal (more symmetrical)
TASK!
Generate a random sample of size 5000 from gamma distribution with shape parameter 2 and scale parameter 3, then plot the density of sample using ggplot.
## [1] 4.551160 8.219747 2.544232 17.281052 7.778567 2.112637
data_z<- as.data.frame(z) #to use ggplot, it is needed a dataframe
library(ggplot2)
h<-ggplot(data_z, aes(x=z)) +
geom_histogram(color="violet", fill="pink", bins = 30) +labs(title = "Histogram of gamma distribution")
b<-ggplot(data_z, aes(x=z)) +
geom_boxplot(color="blue", fill="lightblue") +labs(title = "Boxplot of gamma distribution")
library(ggpubr)
ggarrange(h,b)Conditionals
The syntax of if statement is:
If the test_expression is TRUE, the statement gets executed. But if it’s FALSE, nothing happens. test_expression can be a logical or numeric vector, but only the first element is taken into consideration. In the case of numeric vector, zero is taken as FALSE, rest as TRUE.
The else part is optional and is only evaluated if test_expression is FALSE. It is important to note that else must be in the same line as the closing braces of the if statement.
ifelse() function
ifelse(test_expression, x, y)
This returned vector has element from x if the corresponding value of test_expression is TRUE or from y if the corresponding value of test_expression is FALSE.
A)Generate a random sample of size 20 between 1 to 100. (Again, while generating use set.seed(361)) Then, count the number of even numbers in vector y.
## [1] 50 19 74 7 91 25 73 71 59 51 13 2 53 17 78 48 40 64 46 3
## [1] 8
## [1] 8
Loops and functions
for loop
It is a type of control statement that enables one to easily construct a loop that has to run statements or a set of statements multiple times. For loop is commonly used to iterate over items of a sequence. It is an entry controlled loop, in this loop the test condition is tested first, then the body of the loop is executed, the loop body would not be executed if the test condition is false.
while loop
It is a type of control statement which will run a statement or a set of statements repeatedly unless the given condition becomes false. It is also an entry controlled loop, in this loop the test condition is tested first, then the body of the loop is executed, the loop body would not be executed if the test condition is false.
functions
A function is a set of statements organized together to perform a specific task. R has a large number of in-built functions and the user can create their own functions.
In R, a function is an object so the R interpreter is able to pass control to the function, along with arguments that may be necessary for the function to accomplish the actions.
The function in turn performs its task and returns control to the interpreter as well as any result which may be stored in other objects.
Function Name − This is the actual name of the function. It is stored in R environment as an object with this name.
Arguments − An argument is a placeholder. When a function is invoked, you pass a value to the argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can have default values.
Function Body − The function body contains a collection of statements that defines what the function does.
Return Value − The return value of a function is the last expression in the function body to be evaluated.
A) Write a for loop to calculate the sum of the positive integers up to a given integer.(Let’s say it is 20)
total <- 0
for(i in 1:20){
total <- total + i
cat(paste("Round: ", i, "\t", "Total: ", total, "\n"))
}## Round: 1 Total: 1
## Round: 2 Total: 3
## Round: 3 Total: 6
## Round: 4 Total: 10
## Round: 5 Total: 15
## Round: 6 Total: 21
## Round: 7 Total: 28
## Round: 8 Total: 36
## Round: 9 Total: 45
## Round: 10 Total: 55
## Round: 11 Total: 66
## Round: 12 Total: 78
## Round: 13 Total: 91
## Round: 14 Total: 105
## Round: 15 Total: 120
## Round: 16 Total: 136
## Round: 17 Total: 153
## Round: 18 Total: 171
## Round: 19 Total: 190
## Round: 20 Total: 210
B) Create an 6x6 upper triangular matrix of 1s. (Hint: Firstly, create a zero matrix then fill 1’s with loops.)
\[ A = \begin{bmatrix}1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 & 1 & 1\\ 0 & 0 & 1 & 1 & 1 & 1\\ 0 & 0 & 0 & 1 & 1 & 1\\ 0 & 0 & 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix} \]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0
## [6,] 0 0 0 0 0 0
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 1 1 1 1 1
## [2,] 0 1 1 1 1 1
## [3,] 0 0 1 1 1 1
## [4,] 0 0 0 1 1 1
## [5,] 0 0 0 0 1 1
## [6,] 0 0 0 0 0 1
C) Print integers from 1 to 8 and calculate 8! by using “while” loop.
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
#8!
a <- 1
b <- 8
res <- 1 # 0! = 1
while(a <= b){
res <- res * a
cat("step",a,":","\t","result =", res, "\n")
a = a + 1
}## step 1 : result = 1
## step 2 : result = 2
## step 3 : result = 6
## step 4 : result = 24
## step 5 : result = 120
## step 6 : result = 720
## step 7 : result = 5040
## step 8 : result = 40320
D) Calculate the sum of \(1^p + 2^p + ... + 10^p\) for p = 1, 2, 3, 4, 5, 6 using while loop and print all results.
p <- 1
while(p <= 6){
a <- 1
b <- 10
total <- 0
while(a <= b){
total <- total + a^p
a <- a + 1
}
cat("P:", p, "\t", "Total is", total, "\n")
p <- p + 1
}## P: 1 Total is 55
## P: 2 Total is 385
## P: 3 Total is 3025
## P: 4 Total is 25333
## P: 5 Total is 220825
## P: 6 Total is 1978405
E) Write a function to print the first 𝑛 elements of Fibonacci sequence (0,1,1,2,3,5,… etc, each number is the sum of the two numbers before it). Run this function for 𝑛=25.
n=25
fibonacci <- function(n) {
x <- NULL #an empty vector
x[1] <- 0 #write 0 as the first element of x
x[2] <- 1 #write 1 as the 2nd element of x
for (i in 3:n) {
x[i] <- x[i-1] + x[i-2]
}
return(x)
}
F <- fibonacci(n)
F## [1] 0 1 1 2 3 5 8 13 21 34 55 89
## [13] 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657
## [25] 46368
SUMMARY
Statistical Probability Functions
The following table describes functions related to probability distributions. For random number generators below, you can use set.seed(1234) or some other integer to create reproducible pseudo-random numbers.